06. TD Prediction: Action Values

TD Prediction: Action Values

Similar to TD(0), this method for estimating the action values is guaranteed to converge to the true action-value function, as long as the step-size parameter \alpha is sufficiently small.